Ƀ

WHAT?

WHY?

HOW?

MACHINE LEARNING TECHNIQUES FOR BITCOIN PRICE PREDICTION

EVALUATION METRICS

  • Coefficient of Determination

  • Root-Mean-Square Error

  • Mean Absolute Error

  • Directional Symmetry

DAILY PRICE PREDICTION - APPROACH

  • DATA: blockchain.info, 4 January 2012 - 13 April 2016 (split: 70/15/15)

  • Use a Genetic Algorithm (GA) to select the most relevant Bitcoin network features

  • Manually aid the GA feature selection by observing regression and residual plots

  • Perform a Multiple Linear Regression with the selected features to predict the prices

FEATURE SELECTION WITH NSGA-II

  • Multi-objective optimisation problem:

    • Minimise the feature set size to prevent overfitting

    • Maximise the predictive power (measured by a selected metric)

  • FITNESS FUNCTION: Multiple Linear Regression

  • FITNESS SCORE: determined by the given metric score and the feature set size

REGRESSION PLOTS

In [3]:
a = Image(filename='images/linear-regression_8_1.png')
b = Image(filename='images/linear-regression_8_2.png')
c = Image(filename='images/linear-regression_8_3.png')
d = Image(filename='images/linear-regression_8_4.png')
e = Image(filename='images/linear-regression_8_5.png')
f = Image(filename='images/linear-regression_8_6.png')
display(a,b,c,d,e,f)

MOST COMMONLY SELECTED FEATURES BY THE GA

  • Mining Revenue

  • Network Deficit per Day

  • Cost per Transaction

  • Total Output Value

  • Estimated Transaction Value (USD)

  • Trade v Transaction Volume Ratio

  • Trade Volume

RESULTS

Metric Score
R2 0.6260
RMSE 47.5497
MAE 35.5567
DS 53.2189%
In [4]:
a = Image(filename='images/linear-regression_13_2.png')
b = Image(filename='images/linear-regression_13_3.png')

display(a,b)

RESIDUAL PLOTS

INTRADAY PRICE CHANGE PREDICTION - APPROACH

  • DATA:

    • Limit order book data: cryptoiq.com, 1 January 2016 - 1 May 2016

    • Ticker data: bitstamp.net, 1 February 2016 - 24 April 2016

  • Time series with frequency of 10s, 30s, 1m, 5m, 10m

  • Technical indicators (mainly price-based ones): oscillators, moving averages, indices, etc.

  • Window sizes for indicators: 360, 180, 60

  • Use Bayesian Ridge Regression to determine the most relevant technical indicators

  • Train a Support Vector Regressor with Stochastic Gradient Descent to predict the price changes

BAYESIAN RIDGE REGRESSION

  • 70/30 split

  • All indicators were selected

In [9]:
a = Image(filename='images/bayesian-ridge-regression-lob_19_1.png')
b = Image(filename='images/bayesian-ridge-regression-lob_20_1.png')

display(a,b)

MID PRICE CHANGE PREDICTION RESULTS

Metric Score
R2 0.0111
RMSE 0.4493
MAE 0.2315
DS 56.47%
In [8]:
a = Image(filename='images/bayesian-ridge-regression-lob_16_2.png')
b = Image(filename='images/bayesian-ridge-regression-lob_16_3.png')

display(a,b)

TRADE PRICE CHANGE PREDICTION RESULTS

Metric Score
R2 0.1184
RMSE 0.4393
MAE 0.3103
DS 59.16%
In [17]:
a = Image(filename='images/bayesian-ridge-regression-trades_14_2.png')
b = Image(filename='images/bayesian-ridge-regression-trades_14_3.png')

display(a,b)

SUPPORT VECTOR REGRESSION WITH STOCHASTIC GRADIENT DESCENT

  • DATA SPLITTING:

    • 30% for hyperparameter optimisation (20% calibration, 10% validation)

    • 40% for offline training

    • 30% for testing and online training

  • NSGA-II were used for hyperparameter optimisation:

    • Minimise: RMSE

    • Maximise: DS

  • Again, all indicators were used

MID PRICE CHANGE PREDICTION RESULTS

Metric Score
R2 -0.0051
RMSE 0.4482
MAE 0.2274
DS 55.53%
In [18]:
a = Image(filename='images/support-vector-regression-lob_17_3.png')
b = Image(filename='images/support-vector-regression-lob_17_4.png')

display(a,b)

TRADE PRICE CHANGE PREDICTION RESULTS

Metric Score
R2 0.1064
RMSE 0.4414
MAE 0.3042
DS 60.14%
In [15]:
a = Image(filename='images/support-vector-regression-trades_15_3.png')
b = Image(filename='images/support-vector-regression-trades_15_4.png')

display(a,b)

BACKTESTING - STRATEGY

  • Holding positions: +1 BTC, 0 BTC, -1 BTC

  • Buy 1 BTC when we predict price increase with a previously decreasing price (buy low)

  • Sell 1 BTC when we predict price decrease with a previously increasing price (sell high)

  • Otherwise: do nothing

BACKTESTING - RESULTS

In [19]:
a = Image(filename='images/support-vector-regression-trades_18_2.png')
b = Image(filename='images/support-vector-regression-trades_18_3.png')

display(a,b)

IMPLEMENTATION

  • Node.js scripts running on Amazon EC2 instances collecting data and storing it in DynamoDBs

  • Online accessible Jupyter notebooks for feature engineering and data analysis, running on Imperial's Cloudstack

  • Pandas library for large dataset manipulation

  • Scikit-learn for machine learning

  • DEAP for evolutionary computation

  • Matplotlib and Seaborn for data visualisation

CONCLUSION

  • Price of BTC mainly depends on trading activity and mining prices

  • The Bayesian way of learning seems more successful

  • Some additional evaluation is needed

  • A scalable model is yet to be developed

FUTURE WORK

  • Scalabale Bayesian Ridge Regression model

  • Bayesian Nerual Networks

  • Relevance Vector Machines

?

EVALUATION METRICS

  • Coefficient of Determination

$$ R^2(y, \hat{y}) = 1 - \frac{\sum_{i = 1}^n (y_i - \hat{y}_i)^2}{\sum_{i = 1}^n(y_i - \bar{y})^2}$$
  • Root-Mean-Square Error

$$ RMSE(y, \hat{y}) = \sqrt{\frac{\sum_{i = 1}^n(y_i - \hat{y}_i)^2}{n}} $$
  • Mean Absolute Error

$$ MAE(y, \hat{y}) = \frac{\sum_{i = 1}^n|y_i - \hat{y}_i|}{n}$$
  • Directional Symmetry

$$DS(y, \hat{y}) = \frac{100}{n} \sum_{i = 1}^n d_i$$$$ d_i = \begin{cases} 1, & \quad \text{if } sgn(y_i * \hat{y}_i)\geq 0 \\ 0, & \quad \text{otherwise}\\ \end{cases} $$

MULTIPLE LINEAR REGRESSION

Given a vector of inputs $X^T = (x_1, x_2, ..., x_n)$, the model attempts to predict the hypothesis $h(x)$ denoted as follows:

$$h(x) = \beta_0 + \sum_{i = 1}^n \beta_ix_i$$

where $\beta_0$ is the bias or intercept and the vector $\beta = (\beta_1, \beta_2, ..., \beta_n)$ is the vector of feature weights learned by the model through a cost function, which we wish to minimise. A commonly used one is the residual sum-of-squares:

$$RSS(\beta) = \sum_{i = 1}^m(y_i - X_i^T\beta)^2$$

where m is the number of training data, $y_i$ is the output and $X_i^T\beta$ is the closed form of our hypothesis by setting $x_0$ to 1.

BAYESIAN INFERENCE

$$P(D|S) = \frac{P(D) \times P(S|D)}{P(S)}$$

Given a set of competing hypotheses which explain a data set, then, for each hypothesis:

  1. Convert the prior and likelihood information in the data into probabilities
  2. Multiply them together
  3. Normalise the result to get the posterior probability of each hypothesis given the evidence

Select the most probable hypothesis

BAYESIAN RIDGE REGRESSION

The frequentist formulation of the ridge regression introduces the L2 norm as a penalty to the standard residual sum-of-squares cost function: $$PRSS(\beta)_{L2} = \sum_{i = 1}^m(y_i - X_i^T\beta)^2 + \lambda \sum_{j = 1}^n \beta^2$$ where $\lambda$ is the regularisation parameter. In the Bayesian context, using an L2 penalty is equivalent to setting a Gaussian prior on the weights: $$\beta \sim \mathcal{N}(0, \lambda^{-1} \mathbf{I_p})$$

SUPPORT VECTOR REGRESSION

$$ minimise \quad \frac{1}{2}||\mathbf{w}||^2 + C\sum_{i = 1}^n(\xi_i + \xi_i^*)\\ s.t. \quad y_i - \mathbf{w}^TX_i - b \leq \epsilon + \xi_i \\ \quad \mathbf{w}^TX_i + b - y_i \leq \epsilon + \xi_i^* \\ \xi_i, \xi_i^* \geq 0, \quad \forall i = 1,...,n$$

STOCHASTIC GRADIENT DESCENT

In [10]:
Image(filename='images/sgd.png')
Out[10]:

GENETIC ALGORITHM

In [11]:
Image(filename='images/ga.png')
Out[11]:

NETWORK FEATURES

In [12]:
Image(filename='images/netfeats.png')
Out[12]:

TECHNICAL INDICATORS

In [14]:
Image(filename='images/techinds.png')
Out[14]:
In [ ]: